Determining the effects of lake variables on the relationship of TP (log transformed ug/L) with Chl-a concentrations (log transformed ug/L), using the gradient of residuals against variables.
#linear model of chlorophyll to phosphorus
lm1 <- lm(logCHLA~logTP, data=df)
#creating residuals
df$res.lm <- resid(lm1)plot.lm1 <- ggplot(df,aes(x=logTP,y=logCHLA))+
geom_point()+
geom_smooth(method='lm')+
labs(title="Linear log-log model",x="log TP",y="log CHLA")
print(plot.lm1)## `geom_smooth()` using formula 'y ~ x'
summary(lm1)##
## Call:
## lm(formula = logCHLA ~ logTP, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.6618 -0.4647 0.0533 0.5748 4.5179
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.92330 0.06886 86.01 <2e-16 ***
## logTP 1.03394 0.01596 64.79 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9418 on 6461 degrees of freedom
## Multiple R-squared: 0.3938, Adjusted R-squared: 0.3937
## F-statistic: 4198 on 1 and 6461 DF, p-value: < 2.2e-16
anova(lm1)## Analysis of Variance Table
##
## Response: logCHLA
## Df Sum Sq Mean Sq F value Pr(>F)
## logTP 1 3723.5 3723.5 4197.6 < 2.2e-16 ***
## Residuals 6461 5731.3 0.9
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Outliers were identified as points with studentized residuals outside 3:-3
plot.lm2<- ggplot(df2,aes(x=logTP,y=logCHLA))+
geom_point()+
geom_smooth(method='lm')+
labs(title="Linear log-log model without outliers",x="log TP",y="log CHLA")
print(plot.lm2)## `geom_smooth()` using formula 'y ~ x'
summary(lm2)##
## Call:
## lm(formula = logCHLA ~ logTP, data = df2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.98721 -0.46928 0.03469 0.53016 2.81926
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.30979 0.06214 101.55 <2e-16 ***
## logTP 1.11476 0.01440 77.41 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8247 on 6356 degrees of freedom
## Multiple R-squared: 0.4853, Adjusted R-squared: 0.4852
## F-statistic: 5993 on 1 and 6356 DF, p-value: < 2.2e-16
anova(lm2)## Analysis of Variance Table
##
## Response: logCHLA
## Df Sum Sq Mean Sq F value Pr(>F)
## logTP 1 4075.5 4075.5 5992.6 < 2.2e-16 ***
## Residuals 6356 4322.6 0.7
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Pattern of logCHLA~logTP model residuals along gradients of variables. All variables are fitted to a linear model on the residuals.
# only keep relevant variables
df3 <- df2 %>%
mutate("Log(N:P)"=log(NPratio)) %>%
select(SAMPLE_DATE,
`DEPTH, SECCHI DISK DEPTH_SD_NA`,
`Log(N:P)`,
`TEMPERATURE, WATER_OW_NA`,
`TRUE COLOR_OW_T`,
res.lm) %>%
rename("Sample date"=SAMPLE_DATE,
"Secchi disk depth"=`DEPTH, SECCHI DISK DEPTH_SD_NA`,
"Lake temperature"=`TEMPERATURE, WATER_OW_NA`,
"True color"=`TRUE COLOR_OW_T`)
#loop
for(i in (2:ncol(df3)-1)){
lm.df <- df3[is.finite(df3[,i]),] #removing NAs from relevant column
lm.loop <- lm(res.lm ~ lm.df[,i], data=lm.df) #linear model of residuals against variable
p1 <- ggplot(df3,aes(x=df3[,i],y=res.lm))+ #scatterplot
geom_point()+
geom_smooth(method='lm')+
labs(x=paste(colnames(df3[i])),
y='residuals of LM(logCHLA~logTP)')
if(colnames(df3[i])=="Sample date"){
p1 <- p1+scale_x_date(date_labels="%Y")
}
p2 <- ggplot(data=df3, aes(x=df3[,i])) + #density plot
geom_density(fill="grey") +
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank(),
axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank(),
panel.grid=element_blank(),
axis.line=element_blank(),
panel.background = element_blank())
require(gridExtra)
grid.arrange(p2, p1, ncol = 1, heights = c(1, 4)) #output plots
stats[nrow(stats)+1,] <- c(colnames(df3[i]), #coefficients into stats table
coef(lm.loop),
summary(lm.loop)$coefficients[2,4],
summary(lm.loop)$r.squared)
vars[nrow(vars)+1,] <- c(colnames(df3)[i], #coefficients into variables table
paste(min(as.numeric(na.omit(df3[,i]))),
"-",max(as.numeric(na.omit(df3[,i])))),
nrow(df3[i]),
paste(median(as.numeric(na.omit(df3[,i]))),
"(",summary(as.numeric(na.omit(df3[,i])))[2],
"-",summary(as.numeric(na.omit(df3[,i])))[5],")")) #stats
}## Error in FUN(X[[i]], ...): object 'True color' not found
## Error in eval(predvars, data, env): object 'True color' not found
## Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...): 0 (non-NA) cases
## Error in coef(alm): object 'alm' not found
## Error in coef(blm): object 'blm' not found
Variable descriptive statistics
| characteristic | range | n | median..IQR. |
|---|---|---|---|
| Sample date | 6007 - 18535 | 6358 | 12684.5 ( 8230 - 15878.5 ) |
| Secchi disk depth | 0.05 - 13.6 | 6358 | 3.3 ( 2 - 4.75 ) |
| Log(N:P) | -Inf - 5.64662406081625 | 6358 | 0.0143887374520995 ( -0.671485683778766 - 1.0929824056555 ) |
| Lake temperature | 4 - 36 | 6358 | 22.7777777777778 ( 20 - 25 ) |
| True color | 0 - 222 | 6358 | 10 ( 6 - 19 ) |
| ACID DEPOSITION | 213.6579857 - 1671.703552 | 2985 | 808.8840129 ( 465.2497559 - 975.3500061 ) |
| PRECIPITATION | 76.6439971923828 - 192.690994262695 | 2515 | 105.960250854492 ( 96.1419982910156 - 119.623184926382 ) |
| COLOR (LOW) | Inf - -Inf | 0 | NA ( NA - NA ) |
| COLOR (HIGH) | Inf - -Inf | 0 | NA ( NA - NA ) |
Coefficients for linear models of residuals along gradient of variables. Bold values indicate significant p-values (<0.05). Sorted by rsquared.
| characteristic | intercept | slope | p.value | rsquared |
|---|---|---|---|---|
| Secchi disk depth | 0.542224704653416 | -0.15345441870704 | 1.53166787498659e-173 | 0.120098166534876 |
| ACID DEPOSITION | -0.631338691547156 | 0.000830730528923877 | 1.64554161309906e-68 | 0.0975027001854195 |
| Sample date | 0.779767999996377 | -6.33576953898594e-05 | 1.33224366106376e-135 | 0.0921248014852935 |
| Lake temperature | 0.355842599068243 | -0.0155102786723908 | 2.54317977377244e-07 | 0.00438443901887856 |
| PRECIPITATION | 0.242481381755185 | -0.00110175983642297 | 0.208806730589723 | 0.000628530903058732 |
| Log(N:P) | -0.0138260955201857 | -0.0133570040649917 | 0.09600627689621 | 0.000547682855133057 |
| True color | 0.00550983361886227 | -0.000292078181873163 | 0.676455312693496 | 2.78514288861991e-05 |
## Error in `$<-.data.frame`(`*tmp*`, "DEPTH, BOTTOM_OW_NA", value = NA_real_): replacement has 1 row, data has 0
## Error in summary(lm.dep)$coefficients[2, 4]: subscript out of bounds
Variable descriptive statistics
| characteristic | years | range | n | median..IQR. |
|---|---|---|---|---|
| DEPTH, SECCHI DISK DEPTH_SD_NA | 2018 - 2020 | 0.65 - 11.85 | 275 | 3.9 ( 2.25 - 5.25 ) |
| NITROGEN, NITRATE-NITRITE_OW_T | 2018 - 2020 | 0.0035 - 0.677 | 275 | 0.0202 ( 0.01 - 0.0706 ) |
| NPratio | 2018 - 2020 | 0.0737463126843658 - 115.529010238908 | 275 | 1.42387732749179 ( 0.694444444444445 - 4.58333333333333 ) |
| TEMPERATURE, WATER_OW_NA | 2018 - 2020 | 15 - 30.5 | 275 | 25 ( 22 - 26.75 ) |
| TRUE COLOR_OW_T | 2018 - 2020 | 1 - 85 | 275 | 7 ( 6 - 13.25 ) |
| LAKE DEPTH | 2020 - 2020 | 11 - 11 | 9 | |
| ACID DEPOSITION (DIN:TP>1.5) | 2011 - 2018 | 213.6579857 - 772.1180725 | 82 | |
| ACID DEPOISITION (DIN:TP<1.5) | 2017 - 2018 | 213.6579857 - 712.899106 | 176 | |
| PRECIPITATION | 2013 - 2016 | 76.9628023420061 - 131.316277313232 | 126 | |
| NITRATE+NITRITE (DIN:TP<1.5) | 2017 - 2020 | 0.0035 - 0.118 | 140 | |
| NITRATE+NITRITE (DIN:TP>1.5) | 2011 - 2020 | 0.01 - 0.677 | 72 |
Coefficients for linear models of residuals along gradient of variables in most recent year. Bold values indicate significant p-values (<0.05). Sorted by rsquared.
| characteristic | intercept | slope | p.value | rsquared |
|---|---|---|---|---|
| DEPTH, SECCHI DISK DEPTH_SD_NA | 0.425775593905445 | -0.13559116117993 | 1.7863373757799e-10 | 0.141586467844989 |
| NITROGEN, NITRATE-NITRITE_OW_T | -0.0020484366413141 | -1.28553764785829 | 0.0276685293077767 | 0.0545088648472918 |
| NITRATE+NITRITE (DIN:TP>1.5) | -0.179167545093814 | -0.936519028703293 | 0.121102316724584 | 0.0349721672362281 |
| TRUE COLOR_OW_T | -0.195414603662009 | 0.00751088444361883 | 0.0333853963226258 | 0.0166534142441452 |
| ACID DEPOSITION (DIN:TP>1.5) | -0.865407063050843 | 0.0012385459042191 | 0.279225950206701 | 0.0153878974582628 |
| ACID DEPOSITION (DIN:TP<1.5) | -0.865407063050843 | 0.0012385459042191 | 0.279225950206701 | 0.0153878974582628 |
| PRECIPITATION | 0.657490394537693 | -0.0105336828412298 | 0.176576263884238 | 0.0151656732974757 |
| NPratio | -0.0717610669637638 | -0.00581393287736663 | 0.261077697959329 | 0.0144951145540703 |
| NITRATE+NITRITE (DIN:TP<1.5) | -0.0797417631596919 | 0.345465909212685 | 0.925006169562506 | 6.53877518406656e-05 |
| TEMPERATURE, WATER_OW_NA | -0.0776540042463184 | -0.00111232202755357 | 0.938772055759722 | 2.18931765096922e-05 |
High vs low color, sample date [1] “HIGH COLOR, N= 1444” [1] “LOW COLOR, N= 5017”
| characteristic | intercept | slope | p.value | rsquared |
|---|---|---|---|---|
| High color | 0.977892682796962 | -7.59397826560209e-05 | 2.44617881999168e-22 | 0.0681692927833179 |
| Low color | 0.76644641965883 | -6.26936827840964e-05 | 2.58755039850473e-119 | 0.104002177677548 |
Color trend, sample date [1] “INCREASING COLOR, N= 4250” [1] “DECREASING COLOR, N= 2108”
| characteristic | intercept | slope | p.value | rsquared |
|---|---|---|---|---|
| Increasing color | 0.861328156439938 | -7.24596124689312e-05 | 2.23795234284139e-115 | 0.115492445831473 |
| Decreasing color | 0.658100806089956 | -4.8852529203772e-05 | 8.67420733371484e-30 | 0.0592106598451909 |
Pattern of logCHLA~logTP model residuals along sample date for each lake. Data from each lake are fitted to a linear model on the residuals.
A red regression line indicates the model is statistically insignificant (p>0.05). A red border medians that the lake had a negative chlorophyll trend and a flat or positive phosphorous trend.
#each lake
#loop over each lake
LAKE_IDS <- unique(df$LAKE_ID) #list lakes
coeffs <- data.frame(lake=character(),
intercept=double(),
slope=double(),
'p-value'=double(),
rsquared=double()) #blank coefficient table
incongruous <- read.csv("~/OneDrive - New York State Office of Information Technology Services/Rscripts/Trend/chl-tp-lakes.csv")
incongruous <- incongruous[which(incongruous$CHL_slope!=incongruous$PHOS_slope),]
incongruous <- incongruous[incongruous$CHL_slope=="neg",]
for (i in 1:length(LAKE_IDS)) {
lake <- LAKE_IDS[[i]] #pick a lake
dat <- df2[df2$LAKE_ID == lake,] #create new data with that lake
resid.loop <- lm(logCHLA~logTP,data=dat) #linear model of chl vs tp
dat$res.loop <- resid(resid.loop) #residuals for that lake
lm.loop <- lm(res.loop~SAMPLE_DATE,data=dat) #linear model of residuals vs year
co <- coef(lm.loop)
plot <- ggplot(dat,aes(x=SAMPLE_DATE,y=res.loop))+ #plot lake
geom_point() +
labs(title=paste(lake),x='year',y='residuals of LM(logCHLA~logTP)')
if(summary(lm.loop)$coefficients[,4]<0.05){ #plot significant lakes
plot <- plot + geom_smooth(method='lm')
}else{
plot <- plot + geom_smooth(method='lm',aes(color="red"),show.legend=FALSE)
}
if(lake %in% incongruous$LAKE_ID){
plot <- plot + theme(panel.border = element_rect(colour = "red", fill=NA, size=3))
}else{}
print(plot) #print
coeffs[nrow(coeffs)+1,] <- c(lake,
co,
summary(lm.loop)$coefficients[2,4],
summary(lm.loop)$r.squared)
}Coefficients for linear models of residuals along gradient of time for each lake. Bold values indicate significant p-values (<0.05). Asterisks (***) indicate that the lake had a negative chlorophyll trend and a flat or positive phosphorous trend.
| lake | intercept | slope | p.value | rsquared |
|---|---|---|---|---|
| 0202CHA0122 | 0.935502919564683 | -7.46180295662354e-05 | 1.50255775810091e-11 | 0.170538443604458 |
| 0202FIN0153 | 0.635749577648232 | -5.95078577676436e-05 | 0.000174570204931634 | 0.0712769366388961 |
| 0302SOD0096 | 0.48666769255914 | -3.82719485604449e-05 | 0.0106414539140654 | 0.054010468237099 |
| 0403SIL0115 | 0.502854823465325 | -4.10077102364866e-05 | 0.00183205382752675 | 0.0576617991573737 |
| 0602BRA0154 | 1.06149274566162 | -8.76341561974301e-05 | 1.90262833445895e-11 | 0.255844767247596 |
| 0602CRO0073 | -0.0279684006324576 | 2.34944278403732e-06 | 0.846100684208954 | 0.0002519619542885 |
| 0602EAR0146 | 0.851959664465121 | -6.94640784852235e-05 | 4.50387751003881e-06 | 0.124292588234535 |
| 0602EAT0163 | 1.3623772971199 | -0.000112118118047524 | 1.63564369192852e-15 | 0.344014792791017 |
| 0602HAT0155 | 0.593254365900425 | -4.89064089889259e-05 | 2.94395645687317e-05 | 0.106817806226883 |
| 0602LEB0153 | 1.11902446562697 | -9.78474627625744e-05 | 1.5274614012525e-11 | 0.271692914186169 |
| 0602MEL0039 | 0.637271134764219 | -5.08639374049918e-05 | 6.48105278768977e-06 | 0.0866190932257852 |
| 0602MOR0152 | 0.891944581461986 | -7.51301268569576e-05 | 2.05080228334023e-08 | 0.113300646478107 |
| 0602PET0078 | 0.331789456726404 | -2.53168933367702e-05 | 0.0369076621152334 | 0.0234515777766319 |
| 0703CAZ0153 | 0.449504679704982 | -3.63390675897151e-05 | 0.00281617898232705 | 0.0540762194859375 |
| 0703DER0139A | 1.51722873171037 | -0.000124606449692062 | 2.93793620926997e-26 | 0.367298184087431 |
| 0703TUS0153A | 0.949073117601566 | -8.16720216902249e-05 | 2.25647663987516e-10 | 0.148365549026748 |
| 0705COM0333 | 0.673891148746326 | -5.21493736896624e-05 | 0.000318633359493314 | 0.0644480841680051 |
| 0705DUC0222 | 1.14655293221319 | -9.03839962163187e-05 | 8.00695783808632e-07 | 0.170245070080149 |
| 0801SEC0782B | 0.129851054598145 | -1.10468600256644e-05 | 0.246408207564906 | 0.00725564585045498 |
| 0906BLA0001 | 1.10590132446955 | -9.06208131345451e-05 | 6.25783448358662e-08 | 0.127600869619626 |
| 0906BON0024 | 0.114791426411992 | -8.8938623292828e-06 | 0.24538975933229 | 0.00792918186512283 |
| 0906BUT0054 | 0.796388583921064 | -6.64826965822309e-05 | 1.57274435460665e-06 | 0.0944090894268698 |
| 1005GLE0441 | 0.367866424215463 | -2.90789951287784e-05 | 0.0136180282723025 | 0.0361108061211258 |
| 1102BAB1109 | 0.730036058917398 | -5.88644234673036e-05 | 0.000188977510494219 | 0.0604552801273827 |
| 1104GOO0672A | 0.607548239418277 | -5.24133190855349e-05 | 0.00132457826703814 | 0.0722238220775747 |
| 1104SAC0314 | 0.591673654299803 | -5.09535723627264e-05 | 2.465641569285e-05 | 0.128355921621865 |
| 1104SCH0374 | 1.52501684883298 | -0.000116213381005224 | 5.64158238577784e-10 | 0.202904429881148 |
| 1302CAR0062A | 0.435295439195985 | -3.97203176054331e-05 | 0.00432172076770989 | 0.0962057148638055 |
| 1302WAC0117 | 0.0706969591230919 | -5.66313319549469e-06 | 0.602348849006119 | 0.00151966569190007 |
| 1308ROB0902 | 0.72838703360406 | -5.34849245969701e-05 | 0.00432618905257381 | 0.0648110381252874 |
| 1310QUE0057 | 0.50823785735609 | -3.99890117026576e-05 | 0.00447872961409725 | 0.0398814593146023 |
| 1401ANA0251 | 0.467390753646235 | -3.66754111085953e-05 | 0.00387574012362265 | 0.0386700308115073 |
| 1402WOL0037 | 1.12013529268377 | -9.92151035598942e-05 | 1.83213731872522e-08 | 0.231067618390379 |
| 1404OQU0383 | 1.091460025501 | -8.28530634906584e-05 | 7.14167052707765e-08 | 0.150071736337309 |
| 1501LUC0982B | -0.217136136363868 | 1.76125478750812e-05 | 0.380000553377478 | 0.00714356847465044 |
| 1701LFR0662 | 1.42885257134606 | -0.000105374054264106 | 3.28397374780162e-06 | 0.119867716701037 |
Dot plot of slope for each lake. Grey dots are statistically insignificant (p>0.05).